zimmert and seldin
- Information Technology > Artificial Intelligence > Machine Learning (0.68)
- Information Technology > Data Science > Data Mining > Big Data (0.34)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.48)
- North America > United States > California (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
Last Iterate Analyses of FTRL in Stochasitc Bandits
Zhan, Jingxin, Han, Yuze, Zhang, Zhihua
The convergence analysis of online learning algorithms is central to machine learning theory, where last-iterate convergence is particularly important, as it captures the learner's actual decisions and describes the evolution of the learning process over time. However, in multi-armed bandits, most existing algorithmic analyses mainly focus on the order of regret, while the last-iterate (simple regret) convergence rate remains less explored -- especially for the widely studied Follow-the-Regularized-Leader (FTRL) algorithms. Recently, a growing line of work has established the Best-of-Both-Worlds (BOBW) property of FTRL algorithms in bandit problems, showing in particular that they achieve logarithmic regret in stochastic bandits. Nevertheless, their last-iterate convergence rate has not yet been studied. Intuitively, logarithmic regret should correspond to a $t^{-1}$ last-iterate convergence rate. This paper partially confirms this intuition through theoretical analysis, showing that the Bregman divergence, defined by the regular function $Ψ(p)=-4\sum_{i=1}^{d}\sqrt{p_i}$ associated with the BOBW FTRL algorithm $1/2$-Tsallis-INF (arXiv:1807.07623), between the point mass on the optimal arm and the probability distribution over the arm set obtained at iteration $t$, decays at a rate of $t^{-1/2}$.
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China (0.04)
- North America > United States > California (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.49)
Adaptive Learning Rate for Follow-the-Regularized-Leader: Competitive Analysis and Best-of-Both-Worlds
Ito, Shinji, Tsuchiya, Taira, Honda, Junya
Follow-The-Regularized-Leader (FTRL) is known as an effective and versatile approach in online learning, where appropriate choice of the learning rate is crucial for smaller regret. To this end, we formulate the problem of adjusting FTRL's learning rate as a sequential decision-making problem and introduce the framework of competitive analysis. We establish a lower bound for the competitive ratio and propose update rules for learning rate that achieves an upper bound within a constant factor of this lower bound. Specifically, we illustrate that the optimal competitive ratio is characterized by the (approximate) monotonicity of components of the penalty term, showing that a constant competitive ratio is achievable if the components of the penalty term form a monotonically non-increasing sequence, and derive a tight competitive ratio when penalty terms are $\xi$-approximately monotone non-increasing. Our proposed update rule, referred to as \textit{stability-penalty matching}, also facilitates constructing the Best-Of-Both-Worlds (BOBW) algorithms for stochastic and adversarial environments. In these environments our result contributes to achieve tighter regret bound and broaden the applicability of algorithms for various settings such as multi-armed bandits, graph bandits, linear bandits, and contextual bandits.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.68)
On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits
Coordinating multiple agents that can communicate with each other to make decisions under uncertainty is a classical problem and has many different applications in computer science (Lynch, 1996), game theory (Chakravarty et al., 2014) and machine learning (Lanctot et al., 2017). We consider the multi-agent version of a multi-armed bandit problem which is one of the most fundamental decision making problems under uncertainty. In this problem, a learning agent needs to consider the exploration-exploitation trade-off, i.e. balancing the exploration of various actions in order to learn how much rewarding they are and selecting high-rewarding actions. In the multi-agent version of this problem, multiple agents collaborate with each other trying to maximize their individual cumulative rewards, and the challenge is to design efficient cooperative algorithms under communication constraints. We consider the nonstochastic (adversarial) multi-armed bandit problem in a cooperative multi-agent setting, with K 2 arms and N 1 agents.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (8 more...)